fix(query): Remove duplicated rows when datasets shared Publicly in the hub page by Mrudhulraj · Pull Request #6016 · apache/texera

Mrudhulraj · 2026-06-29T02:18:43Z

What changes were proposed in this PR?

Issue - Duplicate datasets on hub landing page / hub search

Symptom: A user creates a dataset, makes it public, and grants another user explicit access. When the grantee browses the hub, the dataset appears twice in the search results.

Root cause:

DatasetSearchQueryBuilder.constructFromClause produced this SQL:

path: amber\src\main\scala\org\apache\texera\web\resource\dashboard\DatasetSearchQueryBuilder.scala:72

  SELECT DISTINCT ...
  FROM dataset
  LEFT JOIN dataset_user_access ON dua.did = dataset.did
  LEFT JOIN "user" ON ...
  WHERE (dua.uid = <ME>) OR (dataset.is_public = true)

For a dataset that is both public AND explicitly shared with the user, the LEFT JOIN produces one row per matching dataset_user_access row and the OR makes both branches true.
This applies similarly to worflows too.

Fix 1 — DatasetSearchQueryBuilder.constructFromClause

Move the UID filter from the WHERE clause into the JOIN's ON clause so each dataset produces at most one joined row, and force the JOIN to FALSE when uid == null so the SELECT still references a valid
table.

  val baseJoin = DATASET
    .leftJoin(DATASET_USER_ACCESS)
    .on(DATASET_USER_ACCESS.DID.eq(DATASET.DID))
    .**and**(if (uid == null) DSL.**falseCondition**() else DATASET_USER_ACCESS.UID.eq(uid))
    .leftJoin(USER)
    .on(USER.UID.eq(DATASET.OWNER_UID))

  val condition: Condition =
    if (uid == null) {
      DATASET.IS_PUBLIC.eq(true)
    } else if (includePublic) {
      DATASET.IS_PUBLIC.eq(true).or(DATASET_USER_ACCESS.UID.isNotNull)
    } else {
      DATASET_USER_ACCESS.UID.isNotNull
    }

  baseJoin.where(condition)

Why AND FALSE for uid == null?
The SELECT references DATASET_USER_ACCESS.PRIVILEGE. Without dataset_user_access in the FROM, DB throws missing
FROM-clause entry for table "dataset_user_access". AND FALSE keeps the table in the FROM while making the JOIN yield NULL access columns — which is the correct semantic for "no explicit grant".

Behavior matrix:

uid	includePublic	Matched datasets
null	(n/a)	Public only
not null	false	Datasets with explicit access of logged-in user only
not null	true	Public + logged-in explicit access (no duplicates)

Any related issues, documentation, discussions?

Refs #5957

How was this PR tested?

Tested manually with database checks and UI workflow testing.

Was this PR authored or co-authored using generative AI tooling?

No AI tools were used in the process.

github-actions · 2026-06-29T02:18:54Z

👋 Thanks for opening this pull request, @Mrudhulraj!

It looks like the pull request description doesn't quite follow our template yet:

The What changes were proposed in this PR? section is empty; please fill it in.

Filling out the template helps reviewers understand and triage your contribution faster. Please edit the description to complete it. This message will disappear automatically once the template is followed.

You can find the template prompts by editing the description, or see CONTRIBUTING.md for the full contribution flow.

github-actions · 2026-06-29T02:18:56Z

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

Contributors with relevant context: @xuang7
You can notify them by mentioning @xuang7 in a comment.

carloea2 · 2026-06-29T04:02:19Z

Add some specs to both PRs

carloea2 · 2026-06-29T04:25:55Z

Thanks, this is much more focused now.

The core fix looks right to me: filtering DATASET_USER_ACCESS by the current uid in the JOIN should prevent the public-dataset join fan-out, since dataset_user_access is keyed by (did, uid).

I have two requests:

Please add a null-safe fallback for accessPrivilege.
After this change, public datasets without an explicit access row for the current user will have no joined DATASET_USER_ACCESS.PRIVILEGE, so record.get(...) can be null. The frontend type expects "READ" | "WRITE" | "NONE", so I think we should return PrivilegeEnum.NONE instead of null.

Example:

val accessPrivilege =
Option(record.get(DATASET_USER_ACCESS.PRIVILEGE, classOf[PrivilegeEnum]))
.getOrElse(PrivilegeEnum.NONE)
Please add a small spec for the duplicate case:
- dataset is public
- dataset is explicitly shared with the current user
- search with includePublic=true
- result contains exactly one dataset row, with the current user's privilege

Optional cleanup: the condition rewrite is logically okay, but the PR could be smaller if we keep the existing WHERE logic and only change the JOIN. Also, the new comments have a couple typos and “join is skipped” is slightly misleading because the join remains present but is forced to match no rows when uid == null.

Mrudhulraj · 2026-06-29T22:16:08Z

I have fixed it and added appropriate comments.
Final fix: 594a697

codecov-commenter · 2026-06-29T23:58:40Z

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

github-actions · 2026-06-30T00:01:54Z

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 5 worse · ⚪ 8 noise (<±5%) · 0 without baseline

Compared against main 1a58433 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

	config	throughput	MB/s	latency	max Δ latest / 7d
🔴	bs=10 sw=10 sl=64	377	0.23	25,241/41,163/41,163 us	🔴 +21.5% / 🔴 +173.1%
🟢	bs=100 sw=10 sl=64	810	0.495	123,125/137,036/137,036 us	🟢 -14.0% / 🔴 +27.4%
⚪	bs=1000 sw=10 sl=64	932	0.569	1,064,666/1,125,263/1,125,263 us	⚪ within ±5% / 🔴 +9.4%

Baseline details

Latest main 1a58433 from same runner

config	metric	PR	latest main	7d avg	Δ latest	Δ 7d
bs=10 sw=10 sl=64	throughput	377 tuples/sec	405 tuples/sec	777.62 tuples/sec	-6.9%	-51.5%
bs=10 sw=10 sl=64	MB/s	0.23 MB/s	0.247 MB/s	0.475 MB/s	-6.9%	-51.5%
bs=10 sw=10 sl=64	p50	25,241 us	23,715 us	12,612 us	+6.4%	+100.1%
bs=10 sw=10 sl=64	p95	41,163 us	33,887 us	15,070 us	+21.5%	+173.1%
bs=10 sw=10 sl=64	p99	41,163 us	33,887 us	18,360 us	+21.5%	+124.2%
bs=100 sw=10 sl=64	throughput	810 tuples/sec	797 tuples/sec	988.31 tuples/sec	+1.6%	-18.0%
bs=100 sw=10 sl=64	MB/s	0.495 MB/s	0.486 MB/s	0.603 MB/s	+1.9%	-17.9%
bs=100 sw=10 sl=64	p50	123,125 us	123,089 us	101,066 us	+0.0%	+21.8%
bs=100 sw=10 sl=64	p95	137,036 us	159,269 us	107,594 us	-14.0%	+27.4%
bs=100 sw=10 sl=64	p99	137,036 us	159,269 us	115,830 us	-14.0%	+18.3%
bs=1000 sw=10 sl=64	throughput	932 tuples/sec	918 tuples/sec	1,019 tuples/sec	+1.5%	-8.6%
bs=1000 sw=10 sl=64	MB/s	0.569 MB/s	0.56 MB/s	0.622 MB/s	+1.6%	-8.6%
bs=1000 sw=10 sl=64	p50	1,064,666 us	1,085,862 us	986,982 us	-2.0%	+7.9%
bs=1000 sw=10 sl=64	p95	1,125,263 us	1,139,586 us	1,028,491 us	-1.3%	+9.4%
bs=1000 sw=10 sl=64	p99	1,125,263 us	1,139,586 us	1,058,493 us	-1.3%	+6.3%

Raw CSV

config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,529.80,200,128000,377,0.230,25241.47,41162.51,41162.51
1,100,10,64,20,2468.11,2000,1280000,810,0.495,123124.61,137036.14,137036.14
2,1000,10,64,20,21459.36,20000,12800000,932,0.569,1064666.45,1125262.78,1125262.78

…n the hub page fix(dashboard): Remove duplicated rows when dataset shared Publicly in the hub page

…che#6015) ### What changes were proposed in this PR? `AGENTS.md` documents the single-node and k8s deploy paths but never tells an agent how to run the local dev stack. This PR: - adds a "Where Things Live" row routing to `bin/local-dev.sh` — the single entry point (infra in Docker; backend, frontend, and agent-service run natively); - expands the "Develop in a worktree" section to: - prefer `bin/local-dev.sh` while developing; - bounce the stack across worktree switches (`down` in the old worktree, `up` in the new one), since the native services bind fixed ports and share one PID/state dir, so only one worktree's stack runs at a time; - use the non-interactive CLI subcommands — the interactive TUI (`-i`) is for humans, not agents. - drops "Local" from the single-node / k8s deploy row — both can deploy anywhere, not only locally. Only the wrapper is surfaced, never the internal `bin/local-dev/main.sh`. ### Any related issues, documentation, discussions? Closes apache#6014 ### How was this PR tested? Docs-only change; no tests. Verified the linked `bin/local-dev/README.md` path resolves and that the described subcommands (`up` / `down` / `status` / `logs` / `-i`) match `bin/local-dev.sh --help`. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8)

…omote to errors (apache#5985) ### What changes were proposed in this PR? Fix the two remaining Angular extended-diagnostic warnings surfaced by the frontend build, add a unit test for the affected logic, and promote both checks to build errors so regressions can't silently creep back in: - **NG8113** — `CardItemComponent` listed `NzWaveDirective` in its standalone `imports` array but never used it in the template. Removed the unused directive import. - **NG8107** — `computing-unit-selection.component.html` used `pve.name?.trim()`, but `PveDraft.name` is a non-nullable `string`, so the optional chaining is redundant. Extracted the predicate into an `isCreateDisabled(pve)` component method bound via `[disabled]="isCreateDisabled(pve)"`, and added unit tests covering empty / whitespace / valid / padded names (the create button stays disabled until the env name has non-whitespace content). - **Enforcement** — set `unusedStandaloneImports` (NG8113) and `optionalChainNotNullable` (NG8107) to `"error"` under `angularCompilerOptions.extendedDiagnostics.checks`. Placed in `tsconfig.app.json` (the prod build, `build:ci`) rather than the base tsconfig, because `extendedDiagnostics` requires `strictTemplates`, which the unit-test builds deliberately disable; `tsconfig.test.json` clears the inherited option to avoid `NG4003`. No runtime behavior change. ### Any related issues, documentation, discussions? Closes apache#5984 ### How was this PR tested? - `ng test` (the affected spec): **46 passed**, including the 4 new `isCreateDisabled` cases. - `ng build`: succeeds with **0** NG8113/NG8107 diagnostics (now configured as errors) and produces the normal bundle. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@Yicong-Huang

…rkflowExecutionService (apache#5922) ### What changes were proposed in this PR? Consolidates execution init-error reporting into a **single site owned by `WorkflowExecutionService`**, replacing the two-phase split apache#5781 introduced (per @Yicong-Huang's suggestion, tracked in apache#5921). - `WorkflowExecutionService` now registers its `fatalErrors → WorkflowErrorEvent` (and state) diff handler as the **first** construction action. Construction itself does no external work and cannot throw — it only assigns `workflowSettings`, creates a `WebsocketInput` (a `PublishSubject`), and registers the handler. All throwing work lives in `executeWorkflow()`, which runs **after** `executionService.onNext(...)` publishes the execution, so any failure there is recorded by `errorHandler` into the metadata store and surfaced by the handler that `connectToExecution` forwards. - `WorkflowService.initExecutionService` drops the pre-publish `errorSubject` fallback: the `executionPublished` gating and `reportFatalErrorsToSubscribers` are gone, and the catch is simply `errorHandler(e)`. The now-unused `errorSubject` field and its `connect()` subscription are removed. - `WorkflowServiceSpec` is removed — it only tested the deleted `reportFatalErrorsToSubscribers`; the surfacing behavior is exercised by the integration/e2e suites. Net: a single reporting path (the metadata-store diff handler), with no change to the init-error surfacing apache#5781 added — construction is provably side-effect-free, so the pre-publish window no longer has a failure mode. ### Any related issues, documentation, discussions? Resolves apache#5921 — the follow-up refactor agreed during the apache#5781 review. ### How was this PR tested? `scalafmtCheckAll` clean. This is a behavior-preserving refactor of init-error reporting: the single reporting path is the metadata-store diff handler that already surfaced post-publish errors, and construction is side-effect-free so no error can escape it. The full compile, scalafix, and the integration/e2e suites that exercise init-error surfacing run in CI. ### Was this PR authored or co-authored using generative AI tooling? Co-authored with Claude Opus 4.8 in compliance with ASF. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

### What changes were proposed in this PR? The Codecov patch status was `informational: true`, so it never gated PRs. This makes it a real requirement: - set `target: 60%` on `coverage.status.patch.default`; - add `threshold: 10%` so the patch passes when coverage is at least `target - threshold` = 50%, softening the quantization on tiny diffs (e.g. a 2-coverable-line change passes at 1 line / 50%); - drop the `informational` flag so the patch check can fail. PRs must now cover at least 50% of the lines they change (60% target, 10% slack). Project status (auto target, 1% threshold), flag carryforward, and ignore rules are unchanged. The explanatory comment in `codecov.yml` is updated to match. ### Any related issues, documentation, discussions? Closes apache#6021 ### How was this PR tested? Config-only change; `codecov.yml` validated as well-formed YAML. Codecov applies the new policy on the next PR upload. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8)

…nt (apache#5949)  ### What changes were proposed in this PR?  > [Stacked on apache#5947] This PR consolidates the public and private dataset listings to use the shared CardItemComponent. - Replace texera-dataset-card-item with the shared read-only texera-card-item in the public hub. - Remove the unused DatasetCardItemComponent and its tests. - Align the hub list/card toggle with the workflow and private dataset listings. - Update the like button styling on the shared card. This keeps dataset cards consistent across all listings and reduces duplicated code. Demo: <img width="1258" height="400" alt="public-datasets" src="https://github.com/user-attachments/assets/49a083f1-d2a4-47f5-99a4-d70f90ccd947" /> ### Any related issues, documentation, discussions?  Closes apache#5948 ### How was this PR tested?  All existing tests passed, and the changes were manually tested. ### Was this PR authored or co-authored using generative AI tooling?  Generated-by: Claude Opus 4.8

…nent (apache#5566) ⚠️ This PR is stacked on apache#5574. Until that lands, the diff below may also include PR 5's QA/ranking task changes depending on which base GitHub is showing. The new code in this PR is the HuggingFaceComponent (task selector + model browser) under frontend/src/app/workspace/component/hugging-face/, plus the formly registration in formly-config.ts and the declaration in app.module.ts. Once PR apache#5574 merges and this PR is retargeted to main, the diff should auto-clean to the PR 6a frontend selector changes only. ### What changes were proposed in this PR? Add `HuggingFaceComponent`, a custom formly field type (`huggingface`) that provides: - A task dropdown listing all supported HuggingFace inference tasks (fetched from the Texera backend's `/huggingface/tasks` endpoint, with a static fallback list) - A paginated model list with client-side search, fetched from the Texera backend's `/huggingface/models` endpoint (which proxies HuggingFace Hub) - Per-task field state preservation — when switching tasks, previously entered values (modelId, promptColumn, etc.) are saved and restored This PR registers the component in `formly-config.ts` and declares it in `AppModule`. The component is not yet wired into the HuggingFace operator's property editor; the `jsonSchemaMapIntercept` mapping that routes the `modelId` field to this component is added in the follow-up property-editor PR (PR 7). ### Any related issues, documentation, discussions? - Tracking issue: apache#5314 - Closes: apache#5314 - Stacked on: PR tracked in issue apache#5292 - Parent issue: apache#5041 ### How was this PR tested? 7 unit tests added in `hugging-face.component.spec.ts` covering: - Static task list is non-empty and contains expected tasks (text-generation, image tasks, audio tasks, QA/ranking tasks) - Task tags are unique - Cache invalidation does not throw Run with `ng test`. ### Was this PR authored or co-authored using generative AI tooling? Co-authored with Claude Opus 4.7 --------- Signed-off-by: Elliot Lin <36275109+ELin2025@users.noreply.github.com> Co-authored-by: Elliot <36275109+Falcons-Royale@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

### What changes were proposed in this PR? Add `HuggingFaceAudioUploadComponent`, a custom formly field type (`huggingface-audio-upload`) that provides: - An audio file picker that uploads to the Texera backend's `/huggingface/upload-audio` endpoint for server-side storage - Authenticated audio preview/playback — fetches server-stored audio via `HttpClient` (which carries the JWT) and creates a blob URL for the `<audio>` element, avoiding 401s on workflow reload - Local audio preview using a browser Object URL while upload is in progress - Stale response guards — discards upload results and preview fetches if the user clears the field while a request is in flight - Concurrent upload protection — disables the file input and Clear button during upload - Storage of the returned server path in the formly form control - A guidance message explaining why audio is uploaded to backend storage rather than embedded in the workflow JSON This PR also registers `HuggingFaceComponent` and `HuggingFaceAudioUploadComponent` in `formly-config.ts` and declares them in `AppModule`. The `jsonSchemaMapIntercept` mapping that routes the `audioInput` field to this component is added in the follow-up property-editor PR (PR 7). ### Any related issues, documentation, discussions? - Tracking issue: apache#5314 - Closes: apache#5314 - Stacked on: apache#5566 - Parent issue: apache#5041 ### How was this PR tested? 38 unit tests in `hugging-face-audio-upload.component.spec.ts` covering: - `ngOnInit` — filename extraction from server paths, data URLs, empty/whitespace values - `previewSrc` — returns data URLs as-is, empty for server paths (loaded async), empty for blank values - `onFileSelected` — successful upload, non-audio rejection, concurrent upload guard, no-file guard, upload-in-progress state, error handling, filename fallback, model updates, Content-Type/URL encoding - `loadServerAudioPreview` — successful blob fetch, fetch failure error message, stale response discard on both success and error, no-fetch for data URLs, URL encoding - `clearAudio` — full state reset, error preservation, model clearing, dirty/touched marking - Stale upload guards — discard success/error when cleared during flight - `ngOnDestroy` — blob URL revocation - `getDisplayName` — forward/backslash paths, trailing separator, flat filename Run with `ng test`. ### Was this PR authored or co-authored using generative AI tooling? Co-authored with Claude Opus 4.6 --------- Co-authored-by: Elliot <36275109+Falcons-Royale@users.noreply.github.com> Co-authored-by: Anish Shivamurthy <anish@uci.edu> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e#6025) ### What changes were proposed in this PR? The usage banner in `bin/local-dev/main.sh` claimed Docker infra is **not** managed by the script and that you must bring it up yourself before `up`. That contradicts the actual code: `cmd_up` calls `infra_up` (which runs `docker compose up -d` against project `texera-local-dev`) and `cmd_down` calls `infra_down` for any docker targets. ``` Before: comment says "infra is NOT managed here -- bring those up yourself" After: comment says "infra IS managed: `up` brings it up, `down` tears it down" ``` Comment-only change; no behavior change. ### Any related issues, documentation, discussions? Related to apache#5960 (introduced `bin/local-dev` and this banner) and apache#6007 (wired `infra_up` into `up`/`auto`, which is what made the banner stale). No dedicated issue — documentation/comment-only fix (issue-first is excepted for docs). The sibling `bin/local-dev/README.md` already documents the correct behavior ("infra in docker, JVM + frontend native"); this just realigns the script's own banner. ### How was this PR tested? No tests — comment-only change. Verified the described behavior by reading `cmd_up` → `infra_up` (`docker compose up -d`, project `texera-local-dev`) and `cmd_down` → `infra_down`, and by running `bin/local-dev.sh up`, which brought up postgres/minio/lakefs/lakekeeper/litellm automatically (14/14 services healthy) with no manual `docker compose` step. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8)

…at (apache#6027) ### What changes were proposed in this PR? Two small, agent/script-friendly additions to `bin/local-dev.sh` — no change to the human TTY experience. **1. `--json` machine-readable status.** `status --json` prints one JSON object on stdout (no colours, no table) and exits `0` iff every service is running, else `1`: ```json {"branch":"...","sha":"...","running":14,"total":14,"services":[ {"service":"texera-web","port":8080,"type":"jvm","pid":25823,"state":"running"}, ...]} ``` `up` and `down` also accept `--json`: human progress is routed to **stderr** (unbuffered) and the final status JSON goes to **stdout** (via a saved fd), so a caller can `up --json >state.json 2>progress.log` and parse stdout directly. **2. Non-TTY build heartbeat.** In non-TTY mode the spinner can't render in place, so a long silent step (`sbt dist`, output redirected to a log) used to print one line then go quiet for 25s+ — indistinguishable from "stuck" to a non-interactive caller. `tui_spinner` now emits `… still running (Ns)` every `TUI_HEARTBEAT_SECS` (default 15), polling at 1s so it still returns within ~1s of the job finishing (no trailing latency). ### Any related issues, documentation, discussions? Closes apache#6026. Usage banner (`--help`) updated to document `--json` on `status`/`up`/`down`. ### How was this PR tested? - `bash bin/local-dev/tests/test_local_dev_sh.sh` → **19 passed, 0 failed** (added 6: JSON shape/consistency, health-based exit code, unknown-flag negative case, `--help` coverage, heartbeat regression guard, up/down `--json` wiring). - `python -m pytest bin/local-dev/tests/` → **38 passed** (no regression). - Dogfooded end to end: `down --json` → stdout pure JSON (`running:0`, 14 `stopped`), exit 0; `up --json` → single-line JSON `running:14/14` on stdout, and stderr showed the new heartbeat across a ~60s build (`… still running (15s/30s/45s/60s)`), 14/14 healthy. - Verified stdout/stderr separation: `up --json >state.json 2>progress.log` yields parseable JSON in `state.json`. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8)

carloea2 · 2026-06-30T06:21:54Z

I did not see testing of this PR, could you please add it?

Mrudhulraj · 2026-06-30T17:22:36Z

@carloea2 can you guide me here? I just did some manual testing and database checks here.

carloea2 · 2026-06-30T18:28:13Z

Manual checks are useful, but this needs an automated DB-backed regression spec.

Take a look at:
amber/src/test/scala/org/apache/texera/web/resource/dashboard/file/WorkflowResourceSpec.scala

That spec already uses MockTexeraDB from:
common/dao/src/test/scala/org/apache/texera/dao/MockTexeraDB.scala

Please add a regression test under amber/src/test/scala/org/apache/texera/web/resource/dashboard/, for example DatasetSearchQueryBuilderSpec.scala, using MockTexeraDB.

creates a public dataset owned by one user
adds a DATASET_USER_ACCESS row for the searching user
calls dashboardResource.searchAllResourcesCall(..., SearchQueryParams(resourceType = "dataset"), includePublic = true)
asserts the result contains exactly one dataset row for that did
asserts accessPrivilege is the searching user’s privilege

Yicong-Huang · 2026-07-01T08:11:15Z

Thanks @Mrudhulraj and @carloea2. Let's make sure we have test coverage on the code change.

github-actions Bot assigned Mrudhulraj Jun 29, 2026

github-actions Bot added engine fix labels Jun 29, 2026

Mrudhulraj changed the title ~~fix(query): Remove duplicated rows when dataset/workflows shared Publicly in the hub page~~ fix(query): Remove duplicated rows when datasets shared Publicly in the hub page Jun 29, 2026

This was referenced Jun 29, 2026

fix(query): Remove duplicated rows when workflows shared Publicly in the hub page #6017

Open

fix: Remove duplicated rows when dataset/workflows shared Publicly in the hub page #5962

Closed

xuang7 added the release/v1.2 back porting to release/v1.2 label Jun 29, 2026

Mrudhulraj and others added 10 commits June 29, 2026 21:44

fix(dashboard): Remove duplicated rows when dataset shared Publicly i…

d64f044

…n the hub page fix(dashboard): Remove duplicated rows when dataset shared Publicly in the hub page

Mrudhulraj force-pushed the fix/dataset-fetch-query-fix branch from a7bb653 to f65e408 Compare June 30, 2026 04:46

github-actions Bot added frontend Changes related to the frontend GUI ci changes related to CI docs Changes related to documentations infra labels Jun 30, 2026

Merge branch 'apache:main' into fix/dataset-fetch-query-fix

1723df7

github-actions Bot removed frontend Changes related to the frontend GUI ci changes related to CI docs Changes related to documentations infra labels Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(query): Remove duplicated rows when datasets shared Publicly in the hub page#6016

fix(query): Remove duplicated rows when datasets shared Publicly in the hub page#6016
Mrudhulraj wants to merge 11 commits into
apache:mainfrom
Mrudhulraj:fix/dataset-fetch-query-fix

Mrudhulraj commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

carloea2 commented Jun 29, 2026

Uh oh!

carloea2 commented Jun 29, 2026

Uh oh!

Mrudhulraj commented Jun 29, 2026

Uh oh!

codecov-commenter commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

carloea2 commented Jun 30, 2026

Uh oh!

Mrudhulraj commented Jun 30, 2026

Uh oh!

carloea2 commented Jun 30, 2026

Uh oh!

Yicong-Huang commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

Mrudhulraj commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Issue - Duplicate datasets on hub landing page / hub search

Fix 1 — DatasetSearchQueryBuilder.constructFromClause

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Reviewer Suggestions

Uh oh!

carloea2 commented Jun 29, 2026

Uh oh!

carloea2 commented Jun 29, 2026

Uh oh!

Mrudhulraj commented Jun 29, 2026

Uh oh!

codecov-commenter commented Jun 29, 2026

⚠️ JUnit XML file not found

Uh oh!

github-actions Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Benchmark changes need a look

Uh oh!

carloea2 commented Jun 30, 2026

Uh oh!

Mrudhulraj commented Jun 30, 2026

Uh oh!

carloea2 commented Jun 30, 2026

Uh oh!

Yicong-Huang commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Mrudhulraj commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 30, 2026 •

edited

Loading